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COMPUTATIONAL  COMPLEXITY  OF  FOURIER  TRANSFORMS  OVER  FINITE  FIELDS* 


by 


F.  P.  Preparata 


* 

and  D.  V.  Sarwate 


Abstract 


In  this  paper  we  describe  a method  for  computing  the  Discrete  Fourier 
Transform  (DFT)  of  a sequence  of  n elements  over  a finite  field  GF(pm) 
with  a number  of  bit  operations  0(nm  log(nm)  • P(q))  where  P(q)  is  the 

number  of  bit  operations  required  to  multiply  two  q-bit  integers  and 
q = 2log2n  + 4 log2m  + 41og2P«  This  method  is  uniformly  applicable  to  all 
instances  and  its  order  of  complexity  is  not  inferior  to  that  of  methods 
whose  success  depends  upon  the  existence  of  certain  primes.  Our  algorithm 
is  a combination  of  known  and  novel  techniques.  In  particular  the  finite- 
field  DFT  is  at  first  converted  into  a finite  field  convolution;  the  latter 
is  then  implemented  as  a two-dimensional  Fourier  transform  over  the 
complex  field.  The  key  feature  of  the  method  is  the  fusion  of  these  two 
basic  operations  into  a single  integrated  procedure  centered  on  the  Fast 
Fourier  Transform  algorithm. 


COMPUTATIONAL  COMPLEXITY  OF  FOURIER  TRANSFORMS  OVER  FINITE  FIELDS 
I.  Introduction 

The  discrete  Fourier  Transform  (DFT)  over  a finite  field  occurs  in 
many  applications.  It  occurs,  under  the  name  of  Mattson-Solomon  polynomials, 
in  the  theory  of  error-correcting  codes  [l],  [2],  Also,  in  applications  of 
BCH  error-correcting  codes,  the  computation  of  the  syndrome  of  the  error 
pattern  and  the  determination  of  the  locations  and  values  of  the  errors  all 
correspond  to  the  computation  of  the  DFT  of  a sequence  of  elements  from  a 
finite  field.  The  DFT  over  a finite  field  has  been  suggested  as  a means  of 
computing  exactly  the  results  of  the  convolution  of  sequences  of  integers 
[3-8]  and  for  implementing  the  multiplication  of  very  large  integers  [3],  [9]. 
For  these  reasons,  it  is  desixable  to  devise  efficient  methods  of  computing 
the  DFT  over  a finite  field. 

Pollard  [3]  has  shown  that  an  algorithm,  which  is  the  finite  field 
analogue  of  the  well  known  Fast  Fourier  Transform  (FFT)  algorithm  [12],  can 
be  applied  to  computation  of  the  DFT  over  a finite  field.  This  algorithm 
requires  0 (n (n^+n^d- . . .+n^) ) arithmetic  operations  in  the  field  to  compute  the 
DFT  of  a sequence  of  n elements  where  n = n^*^* . . . ’n^.  Thus,  the  finite- 
field  FFT  algorithm  is  efficient  only  when  n is  highly  composite,  and  in  this 
case  the  algorithm  requires  0(n  log  n)  arithmetic  operations  in  the  field. 
Several  authors  [6-9]  have  considered  special  cases  of  the  FFT  for  particular 
applications,  using  special  properties  of  carefully  chosen  fields  to  achieve 
computational  efficiency.  For  the  general  case,  however,  the  efficiency  of 
the  FFT  algorithm  depends  upon  the  vagaries  of  the  factorization  of  n. 


In  this  paper,  we  present  a method  of  computing  the  DFT  in  a finite 


field  that  is  uniformly  applicable  to  all  finite  fields.  The  method  used  is 
based  on  some  ideas  that  have  been  discussed  in  the  past  ([3]  and  [ 9 - 1 1 ] ) as 
well  as  on  some  novel  techniques.  The  basic  idea  is  to  convert  the  computation 
of  the  DFT  of  a sequence  of  length  n into  the  computation  of  the  cyclic  convo- 
lution of  two  appropriately  defined  sequences.  The  techniques  for  doing  this 
are  due  to  Bluestein  [10]  (also  briefly  discussed  by  Pollard  [3]  for  application 
in  certain  special  cases  only)  and  to  Rader  [ 1 1 ] . The  main  feature  of  these 
techniques  is  that  the  length  of  the  sequences  to  be  convolved  can  be  chosen 
as  any  integer  N > 2n-l.  Since,  by  virtue  of  the  well-known  convolution 
theorem,  convolutions  can  be  implemented  via  Discrete  Fourier  Tranforms  over 
a field  which  we  are  now  free  to  choose,  N will  be  selected  as  a power  of  2 for 
optimal  FFT-implementation  of  the  transforms. 

The  latter  transforms  are  computed  in  the  complex  field.  In  order  to 
apply  the  complex  field  DFT  to  elements  of  a finite  field  GF(pm),  we  resort 
to  the  standard  representation  of  the  latter  as  polynomials  with  integer 
coefficients  and  of  degree  less  than  m.  The  convolution  of  sequences  over 
GF (pm)  could  then  be  recovered  by  combining  the  results  of  m^  convolutions  of 
integer  sequences;  however,  by  observing  that  the  latter  combination  can  be 
formulated  as  a cyclic  convolution,  the  entire  operation  is  implemented  via 
a two-dimensional  FFT.  Finally  by  noting  that  Bluestein's  technique,  i.e.  the 
conversion  of  the  DFT  to  a cyclic  convolution,  involves  a pre-conditioning 
and  a post-conditioning  consisting  of  multiplications  in  GF(pm),  a considerable 
computational  saving  is  obtained  by  combining  preconditioning,  convolution,  and 
postconditioning  into  a single  procedure.  In  this  procedure,  which  involves 
computations  in  several  transform  domains,  those  three  conceptual  steps  essentially 


blur  their  coniines.  The  result  is  that  the  DFT  of  sequences  of  n elements 
over  GF(pm)  can  be  computed  with  0 (nm*  log  (nm) • P(q) ) bit  operations,  where 
P (q)  is  the  number  of  bit  operations  required  to  multiply  two  q-bit  integers 
and  q = 21og2n  + 41og2m  + 41og2p. 

The  method  proposed  in  this  paper  basically  resorts  to  the  complex  field 
FFT  for  computing  integer  convolutions  and  departs  from  the  prevalent  trend. 

In  fact,  in  recent  years  several  authors  [3-8]  have  suggested  that  the  complex 
field  FFT  not  be  used  for  computing  integer  convolutions  because  of  the 
problems  of  round-off  error.  The  alternative  is,  of  course,  the  finite  field 
FFT  based  on  a careful  choice  of  the  finite  field.  We  have  not  followed  the 
latter  approach  for  the  following  reasons:  (i)  we  are  interested  in  techniques 

which  are  uniformly  applicable  to  all  cases,  whereas  the  success  of  the  FFT 
over  a finite  field  depends  upon  the  existence  of  primes  of  a special  type  [13] 
and,  as  is  easily  shown,  no  higher  efficiency  is  achieved,  even  for  the  case  of 
prime  fields;  (ii)  the  round-off  error  problem  is  entirely  controllable,  since 
real  numbers  can  be  economically  represented  so  that  in  all  cases  the  results 
will  be  correctly  interpreted. 


4 


II.  Discrete  Fourier  Transforms  via  Convolutions 


Let  p denote  a prime, 

tn  jji 

GF(p  ) the  finite  field  containing  p elements, 

...  , m , 

n a divisor  of  p - 1 , 

a a primitive  nth  root  of  unity  in  GF(pm). 

In  what  follows  we  shall  also  make  use  of  a short-hand  notation  for 

l t 

vectors:  (a^^  with  k<  a denotes  the  vector  < av,  > and  (0) 


k’  k+1’ 


denotes  the  vector  < 0,0,..., 0 > whose  t components  are  all  equal  to  0;  also 
uv  denotes  the  concatenation  of  two  vectors  u and  v. 

Let  a..  £ GF(pm)  for  i = 0,l,...,n-l.  The  Discrete  Fourier  Transform 


. n-  1 


n-1 


of  (3^)q  is  defined  to  be  the  sequence  (A^)^  where 


n-1 


AJ  ' So  V 


1J 


j g to, n-1] 

The  inverse  Fourier  Transform  to  (1)  is  given  by 


(1) 


-1  n_1 

a.  = (n)  £ A. a 1J 

j=0  J 

-1 


i € [0 , n- 1 ] 


(2) 


where  (n)  is  the  multiplicative  inverse  in  GF (pm)  of  the  field  integer  n. 

2 

As  is  well  known,  the  direct  method  of  computing  the  DFT  requires  0(n  ) 

, m 

arithmetic  operations  in  GF  (p  ).  If  n is  composite,  with  n = i 

then  the  finite-field  analogue  of  the  FFT  algorithm  over  the  complex  field 
can  be  used  to  compute  the  DFT  [3].  This  algorithm  requires  0 (n (n^+n^+ . . .n  ) ) 
arithmetic  operations  in  GF(pm).  Thus  for  highly  composite  n,  the  DFT  can  be 
computed  in  0(n  log  n)  arithmetic  operations  in  GF(pm). 


When  n is  not  highly  composite,  or  is  a prime,  the  saving  in  computation 
achieved  by  resorting  to  the  FFT  can  be  small  (or  nonexistent).  In  such  cases 
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the  DFT  problem  is  reformulated  as  a convolution  problem,  to  the  solution  of 
which  more  general  methods  can  be  applied.  There  are  two  main  techniques 
for  such  reformulation,  which  we  now  briefly  review  for  the  reader's  benefit. 

The  first  technique  is  due  to  Bluestein  [10],  and  is  based  on  the 
following  expression  for  A ^ : 


A.  = Z a Of 


n-1  > .2  A .2  . 2 

v a A 1 + J - ] 


E a-a 
i=0 


2 n_1  2 /••  -s2 

= pj  E (a  31  )C8'(J-1) 

i=0  1 


) , where  p = a‘ 


The  sum  in  (3)  can  be  recognized  to  be  a term  of  the  convolution  of  the 
sequence  (a^p1-2)^  1 with  the  sequence  (p-i2)”  It  is  easy  to  see  that  the 

convolution  of  these  sequences  can  be  obtained  from  the  cyclic  (or  periodic) 

...  f . j2  n-1  ^..N-n  , / -i2\n-l  (2n- 1) 

convolution  of  the  sequences  (a^p-1-  (0)  and  (p  * ) n (0)  , 

of  common  length  N,  where  N must  be  no  smaller  than  2n-l,  and,  in  view  of 

optimally  executing  the  FFT,  is  conveniently  chosen  as  a power  of  2.  Thus, 

N = 2S,  where  s > flog2  (2n-l)l  . 

Notice,  however,  that  the  two  length-N  sequences  to  be  convolved  are  not 

always  over  GF (p  ).  Indeed,  they  are  over  GF(p  ) in  the  following  two  cases: 

(i)  p=2 , since  p = is  primitive  of  order  n in  GF  (pm) ; (ii)  p^2  and 

2n| (pm-l),  since  p is  primitive  of  order  2n  in  GF(pm).  Otherwise,  when  p^2  and 

2n  f(pm-l),  p is  a primitive  2nth  root  of  unity  in  GF(p2m)  and  we  have  a convo- 

2m 

lution  of  sequences  over  GF(p  ).  In  this  case  we  carry  out  the  computation  in 


1 ^ 

this  larger  field.  However,  the  result  of  multiplying  by  the  corresponding 

term  of  the  convolution  will  be  an  element  of  GF(pm).  Thus  in  the  sequel  we 

shall  simply  refer  to  convolutions  of  sequences  over  GF (pm) , with  the  under- 

2 m 

standing  that  whenever  this  field  must  be  replaced  by  GF(p  ),  the  complexity 
analysis  is  correspondingly  modified. 


A second  technique  to  compute  DFTs  via  convolution  is  due  to  Rader  [11], 

This  technique  is  unfortunately  restricted  to  the  case  in  which  n is  a 

prime,  but  in  some  instances  it  is  slightly  more  efficient  than  the  more 

general  Bluestein's  technique,  as  illustrated  below.  Let  the  integer  0 be 

a primitive  root  of  unity  modulo  n and  let  0(i)  be  the  smallest  integer 

( i ) 

such  that  ev'  mcd  n = i for  i € [l,n-l].  Then 


Aj  ' ao  3 z V 

-*  i — i 


l(e*<j>mod  n)  ' a0  iZ1  a(e®(i)mod  n)  a 


,0(1)  + 0 (j) 


Since  ( 0(i))^  * is  simply  a permutation  of  (i)”  \ we  can  set  £ = 0(j)  and 
use  k = 0(i)  as  the  index  of  summation  to  get 


(9^mod  n) 


n-1 

E 3 k 
k=l  (0  mod  n) 


This  shows  that  the  sequence  (A 


- a ) is  the  cyclic  correlation 

n)  0 1 


of  the  sequences  (a^R^^  )”  1 and  ( 1 of  length  n-1.  We  obtain  a 

cyclic  convolution  by  reversing  one  of  the  sequences.  The  advantage  of  Rader's 


T 


method  is  that  both  sequences  are  always  over  GF(p  ) and  are  shorter. 

k 

In  particular,  if  the  prime  n happens  to  be  of  the  form  2+1,  Rader's 


method  gives  a convolution  of  sequences  of  length  2 whereas  Bluestein's 

,k+2 


method  would  give  a convolution  of  sequences  of  length  2 


In  this  paper 


we  shall  not  further  consider  Rader's  method  due  to  its  limited  applicability, 
and  marginal  superiority  when  applicable. 

The  preceding  discussion  indicates  that  to  compute  the  OFT  of  a 

t . n ~ 1 . m . t f . 

sequence  (3^)q  over  GF (p  ) using  (3)  we  must 

(i)  construct  the  sequences  (g^)g  and  (a^g*^)”  1 

and  extend  the  latter  two  to  length  N with  appropriate  strings 
of  zeros  ; 

( (ii)  compute  the  cyclic  convolution  of  the  latter  two  sequences; 

.2 

(iii)  multiply  by  gJ  , j € [0,n-l],  the  corresponding  term  of  the 
convolution. 

In  the  next  section  we  shall  show  how  the  preceding  three  steps,  which 


are  separately  stated  for  conceptual  convenience,  can  be  integrated  into  a 
single  efficient  computational  procedure. 
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III.  An  Integrated  Procedure  For  DFT  Implementation 

The  objective,  embodied  by  Bluestein's  and  Rader's  techniques,  of 
reformulating  DFTs  as  cyclic  convolutions  of  sequences,  is  that  the  common 
lengths  of  the  latter  can  be  chosen  so  that  the  ^FT  technique  is  always 
optimally  applicable.  In  fact,  as  is  very  well  known,  efficient  computation 
of  convolutions  is  based  on  the  convolution  theorem  [3],  which  states:  Let 

(w^)^  * denote  the  result  of  cyclically  convolving  (x.)q  ^ and  (jOq  i.e. 

N-l 

Wj  = Xi  y((j-D)  for  j € [O.N-l] 

where  ((j-i))  denotes  (j-i)  mod  N.  If  (W^)^  \ (x^)q  and  j(Y^)q  ^ denote 
their  respective  Fourier  Transforms,  then 

Wj  = Xj  Yj  for  j <E  [O.N-l].  (4) 

Suppose  at  first  that  GF(pm)  contains  an  Nth  root  of  unity.  We  can  find 
(X^Jq  1 and  (Y^q  *with  0(N  log  N)  arithmetic  operations,  find  the  Wj's  from  (4) 


tions,  all  in  GF(pm).  This  implies  that,  when  GF(pm)  contains  an  Nth  root  of 
unity,  the  Fourier  Transform  of  a sequence  of  length  n can  be  found  in 
0(n  log  n)  arithmetic  operations  in  GF(pm).  However,  recall  that,  by  the 
original  formulation  of  the  DFT  problem,  GF(pm)  is  assumed  to  contain  an  nth 
root  of  unity.  Thus  we  must  have  both  n| (pm- 1)  and  N|(pm-1).  Now,  notice 
that  N,  a power  of  2 , is  highly  composite,  while  n is  presumably  not  so; 
therefore,  if  n and  N have  a very  small  G.C.D.,  then  we  may  reasonably 
conclude  that  n is  or  smaller;  thus  the  event  that  GF(pm)  contains 

an  Nth  root  of  unity  is  likely  to  be  quite  rare. 


1 


On  the  other  hand,  it  has  been  suggested  in  a similar  context  [3]  that 

since  an  Nth  root  unity  will  exist  in  some  extension  field  of  GF(pm)^, 

the  DFT  can  be  found  in  0 (n  log  n)  arithmetic  operations  in  this  extension 

field.  This  approach  can  be  quite  deceptive,  however,  because  the  extension 

field  could  be  very  large  and  the  complexity  of  the  arithmetic  could  far 

2 

outweigh  the  reduction  of  the  number  of  operations  from  0 (n  ) to  0(n  log  n) , 
as  the  following  examples  indicate. 

Example  1.  Take  p = 3,  m = 3,  n = 26.  Since  2n|^(pm-l),  we  have  to 
compute  a convolution  of  sequences  of  length  2^  over  GF(3^).  Now,  a 
64th  root  of  unity  exists  in  GF(3^);  however,  the  smallest  extension 
field  of  GF(3^)  containing  a 64  th  root  of  unity  is  GF(3^). 


Example  2,  Take  p=2,m=5,n=  1.  Here  we  have  to  compute  a con- 

4 ( l') 

lution  of  sequences  of  length  3 ' . An  81st  root  of  unity  exists 


vo 


in 


54  5 

GF (2  ) and  the  smallest  extension  field  of  GF(2  ) containing  an  81st 

270 

root  is  GF (2  ) . 

Actually,  one  can  use  any  highly  composite  integer  N',  satisfying 
N1  > 2n-l  and  p ^ N1,  as  the  length  of  the  sequences  to  be  convolved; 
and  try  to  find  a (N')throot  of  unity  in  an  extension  field  of  GF(pm). 
Thus,  we  have 


(1)  A word  of  caution  is  in  order.  No  finite  field  of  characteristic  2 
can  contain  an  Nth  root  of  unity.  To  avoid  this  difficulty,  when  p = 2 

g 

one  could  choose  N = 3 > 2n-l,  still  permitting  an  efficient  implementation 
of  the  FFT. 
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Example  2.  (continued).  Choose  N1  = 63  > 61.  A 63rd  root  of  unity 

exists  in  GF(2^)  and  the  smallest  extension  field  of  GF(2^)  containing 
30 

a 63  rd  root  is  GF(2  ). 


The  preceding  discussion  indicates  that  computation  of  the  convolution 
in  GF(pm)  or  its  extensions  is  not  viable  in  general.  What  is  desirable 
instead  is  a method  universally  applicable  to  all  instances  of  p and  m and 
of  easily  determinable  complexity,  although  for  specific  isolated  choices  of 
p and  m other  approaches  may  be  slightly  moie  efficient. 

To  this  end  we  propose  to  transform  the  problem  of  convolving  sequences 
over  GF(pm)  into  the  problem  of  convolving  arrays  of  integers,  i.e.  natural 
numbers.  This  approach  is  not  novel:  in  fact  it  was  proposed  by  Pollard 

([3],  sect. 8)  in  connection  with  the  use  of  Bluestein's  method  for  the 
very  particular  case  m = 1 and  2n|(p-l),  i.e.,  for  a prime  field  containing 
a 2n-th  root  of  unity.  Both  restrictions,  however,  are  unnecessary.  In  fact, 
we  have  shown  in  Section  II  that  if  the  field  does  not  contain  a 2n-th  root  of 
unity,  we  merely  need  to  compute  the  convolution  in  a somewhat  larger  field. 
The  condition  m = 1 was  imposed  so  that  the  sequences  could  be  treated  as 
sequences  of  integers  in  the  range  [0,p-l]  and  convolved  by  the  methods  of 
[3,  section  3].  We  now  show  that  this  restriction  can  be  easily  removed. 

First  of  all,  we  review  the  standard  representation  of  elements  of  GF(pm) 
as  polynomials.  Let  j be  a root  of  a monic  irreducible  polynomial  g(z)  of 
degree  mover  GF (p).  Then  every  element  x^  £ GF(pm)  can  be  represented  as  a 

polynomial  x. (r)  of  degree  less  than  m in  Q , i.e. 

1 


xi<c>  = xi,kC  ’xi,k€GF(p) 
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As  is  well-known,  addition  in  GF(p  ) corresponds  to  addition  of  polynomials 
over  GF(p),  and  multiplication  in  GF(p  ) corresponds  to  multiplication  of  poly- 
nomials over  GF(p)  modulo  g(r). 

N-l 

As  before,  let  (w^)q  denote  the  result  of  the  cyclic  convolution  of 
Xi  0 and  «i^Q  ’ Then  we  nave 


“j  ‘ t50  v«j-t» 

and,  using  the  representation  (5)  of  elements  of  GF (pm)  as  polynomials,  we 


<j(C)  = E^  (x.(^)  y((j-i))(C))  mod 


N-l  2m-2  t 


E ( E ( E x y((.  0)  t.R)  £ ) ">od  g(£) 
i=0  t=0  k=0  K 


where  x.  , and  y ...  ...  , are  taken  to  be  zero  if  the  second  subscript 

i,k  ((j-i)),t-k 

exceeds  m-1.  Thus, 


2m-2  t N-l 


Wj(^)  = 


[ I ( E E x t.k)  C j mod  g(') 

t=0  k=0  i=0  1,1C  i;;,t  k 


( E z i t m°d  g(C) 

t=0 


t N-l 

where  z.  ~ T E xj  k t-k  ^ gf(p)* 

J,t  k=0  i*o  i,k  L^*t  k 


N-l  N-l 

Now  suppose  we  treat  all  the  terms  of  (x^  and  (y^  as  element! 

of  Z,  the  set  of  the  integers,  rather  than  of  GF(p).  With  this  stipulation, 
we  define  the  integers 


i 


t N-l 


z,  t = E ( E x k y/zi.ij)  t.k  ) for  j 6 [O.N-lj 

J’  k=0  i=0  ’ UJ  L))>t  k (6) 

and  t £[0,2m-2] 

and,  obviously,  z.  mod  p = z . The  right  side  of  (6)  has  the  form  of 
J > t j , t 

a double,  or  two-dimensional,  convolution  which  is  a periodic  convolution  in 
one  dimension  and  an  aperiodic  convolution  in  the  other.  The  latter  can  be 
easily  converted  into  a periodic  convolution  by  extending  the  range  of  k and 
t to  [ 0 , M- 1 ] where  M > 2m-l.  Letting  [[t-k]]  denote  (t-k)  mod  M,  (6)  can  be 
rewritten  as 


M-l  N-l 

’ J0  J0  Xi,k  y((j-i)),  [[t-k]] 


It  is  convenient  to  think  of  the  set  [x.  , }, 0 i ^ N-l,  0 ^ k ^ M-l 

i , k 

as  a two-dimensional  N x M array  [x  "l,  and  of  (7)  as  an  array  convolution. 

L i ,kJ 

Let  [X.  ] denote  the  two-dimensional  Fourier  Transform  of  Tx  1 over  the 

J >£  L i,k  1 

complex  field,  i.e., 


M-l  N-l 

i j k£ 

' E E xi  k V % 
k=0  i=0  ’ 


J2tt/N 


J2tt/M 


where  = e and  = e are  primitive  roots  of  unity  of  order  N 

and  M respectively  in  the  complex  field.  If  [Z.  ] and  [Y.  ] denote  the 

J >1  J»j l 

transforms  of  the  arrays  [z^  and  [ y ^ ^] , respectively,  then  we  have 


Zj,t  = Xj,t  Yj,t 


for  j e [ 0 , N—  1 ] and  t <E  [O.M-l]  (9) 


Recall  from  Section  II  that  one  of  the  two  sequences  to  be  convolved  is 
(a^g^)^  ^ (0)N  n,  i.e.  it  is  the  term-by-term  product  of  the  two  sequences 


. Representing  the  field  elements  as  polynomials 


/ \ n ~ i . w “u  1/  N n-  i /r. . N-n 

(ai)Q  (0)  and  (31  )Q  (0) 

■ 2 

as  in  (5),  the  element  a 31  is  a polynomial  of  degree  (2m-2)  before  it  is 

reduced  mod  g(£).  If  we  neglect  to  do  this  reduction,  the  convolution 

in  (3)  will  give  a polynomial  of  degree  0m-3)and  the  post-convolution  multi- 

plication  by  3J  will  give  a polynomial  of  degree  (4m-^.  Reducing  this  last 

polynomial  mod  g(r)  produces  the  same  result  as  a reduction  mod  g(£)  after 

r 

each  multiplication.  With  this  in  mind,  we  choose  M = 2 where  r = 

riog0(4m-3)l  and  define  the  N X M arrays  [a,  ,],  [b.  ] and  fy  1 corres- 

2 y 1 ,k  1 ,k  1 i,kJ 

,.  . , . n- 1 .. N-n  , i2.n-l,„.N-n  , . _*2.n-l  .„.N-2n+l 

ponding  to  the  sequences  (a.)Q  (0)  , (3 1 )Q  (0)  and  (3  1 ^ (0) 

Let  us  also  define  the  row-Fourier  Transform  (RFT)  of  the  array  [x.  ] as 

the  array  [x!  ] where 

i 1 SL 


M-l 


X , **  E X • L,  * “M 

i»A  k=0  1>k  M 


(10) 


and  the  column-Fourier  Transform  (CFT)  as  the  array  [xV  ,]  where 

J 


N-l 


c",  = £ X.  . Q 

j,k  it0  1 > k WN 


ij 


(ID 


Using  the  notation  [x!  ] = RFT[x  ,]  and  [x'.'  ] =»  CFT[x.  ,]  , we  have 

1,/  i ,k  j,k  i,K 


CFT [ RFT [x  ]]  = RFT [CFT[ x ]] 

1 ) K 1 , K 

■ CRrTi-i,ki 

■ 

where  X.  was  defined  in  (8). 

J .£ 

We  thus  have  the  following  algorithm. 
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Algorithm 


Step  1. 


Step  2. 


- '"[■i.ki 


(Comment:  The  rows  of  the  array  [ x_! 

i *Z 

representations  of  the  polynomials  a^p 


] are  the  transform  domain 


Step  3. 

rxj,tJ  - CFIK,ti 

Step  4. 

- CRFTtyi,k 

Step  5. 

C*JJYja 

(Comment:  [Z,  .1  is  the  transform  domain  representation 
J 

of  the  convolution  in  (3)). 


Step  6.  [z'  ]«-CFT'l[Z  ] 

L >jL  J 

(Comment:  The  rows  of  the  array  [z]  } are  the  transform 

i»j l 

domain  representations  of  the  polynomials  defined  by 

n-1  .2  ..2 

£ (a  pJ  ) ). 

j=0  J 


Step  7. 

ru!  1 <-  fz ! -b!  1 

1 i»£J  L i.A 

Step  8. 

[u  ] ♦-  RFT-1[U'  1 
i ,i 

(Comment : The  array  [u^  represents  the  desired 

Fourier  Transform  as  polynomials  of  degree  4m-4 
with  integer  coefficients.) 


I 

I 


IV.  Performance  Evaluation  oi  the  Algorithm. 
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Tlio  tol lowing  analysis  is  based  on  the  so-called  "logarithmic  model" 

( [4]>  Chapter  1),  that  is,  the  time  required  by  the  algorithms  will  be 
estimated  in  terms  ol  "bit  operations"  rather  than  of  "operand  operations" 

(as  is  the  case  with  the  so-called  "uniform  model").  This  choice  is  justified 
on  the  grounds  that  the  length  of  the  operands  used  by  the  algorithms  depends 
upon  the  parameter  n,  the  original  sequence  length.  Thus  assuming  a fixed 
operand  size  would  be  erroneous;  on  the  other  hand  the  running  time  on  any 
given  conventional  machine  of  the  random  access  memory  type  will  be  propor- 
tional to  the  bit  complexity  to  be  evaluated. 

A basic  operation  used  by  the  convolution  algorithm  described  in  the 
preceding  section  is  the  Fourier  Transform  of  a sequence  of  integers  in  the 
complex  field.  The  choice  of  this  field  is  obviously  motivated  by  the  fact 
that  a primitive  root  of  unity  exists  for  every  order  N;  thus  N can  be  chosen 
as  the  one  which  yields  an  optimal  implementation  of  the  FFT  algorithm. 
Resorting  to  the  complex- field  FFT  is  by  no  means  novel;  it  was  proposed  in 
the  past  in  connection  with  similar  problems,  notably  by  Schtfnhage  and 
Strassen  [9]  as  a device  for  fast  integer  multiplication.  The  main  problem 
encountered  with  this  approach  is  that  the  accuracy  with  which  complex  numbers 
are  represented  must  be  sufficient  to  guarantee  that  the  rounded-off  results 
exactly  yield  the  correct  integers.  We  now  estimate  in  detail  the  number  of 
bits  that  must  be  used  to  represent  the  powers  of  the  primitive  roots  of 
unity  and  A similar  analysis,  applied  however  to  a considerably 

simpler  situation,  can  be  found  in  [9]. 

Let  Ng=2t.  As  is  well-known,  the  ith  stage  (i=0,l t-l)in  the  execution 


' ft  ‘ 
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of  the  FFT  consists  of  a set  of  "butterfly  operations",  in  each  of  which  two 
complex  numbers  and  are  combined  to  yield  two  outputs  A^+^  and  B.+  j 
where 


1 A = A.  + 4QB. 
l+l  l * l 


1+1 


a1  - 


(12) 


where  Q is  some  power  of  fj  = e^  Tl>  Let  T|  be  the  maximum  error  in  any 

power  of  Q^o’  ^et  be  the  maximum  error  in  the  inputs  to  the  ith  stage  and 
let  be  the  maximum  modulus  of  the  inputs  to  the  ith  stage.  From  (12)  we 
have 

hn  =i  2h  * 2‘+\ 


and 


ei+l  ~ ei  + BiT'  + :iei  * LiT1  + 2*t  < 2 V1  + 2ei 


since  | B ^ | and  |q|  = 1.  Therefore,  we  have 


and 


€i+l  — <1+»2‘  V>  + 2‘+‘  «0 

«t  S c'2t‘1  V>  + 2t'o*HLo'"+  Vo 


is  the  bound  on  the  error  of  the  Fourier  Transform. 

For  the  inverse  Fourier  Transform,  the  equations  (12)  are  slightly 
modified.  Using  B , L,  and  e±  to  denote  the  corresponding  quantities  for 
the  inverse  transform,  we  have 


f + n^) 

i < K - 


(14) 


. -"T" 
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Therefore,  we  can  choose  LQ  = = . . . = Lt,  whence  J ± + B.7|  + fie j] 

^ - 1 - (2)  . 1 - 

- ei  + 2 LoT^  ' Hence>  we  have  pt  - *0  + 2l  V1' 

From  this,  it  is  easy  to  determine  the  maximum  moduli  of  and  maximum 
errors  in  the  results  produced  at  each  of  steps  1-8  of  the  algorithm.  These 
are  given  in  Table  1. 


Step 

Maximum  modulus  of  result 

Maximum  error  in  result 

Mp 

yrMP’i 

2 

mV 

2 2 

rM  p T) 

3 

2 2 
NM  p 

(r  + js)NmVt) 

4 

NMp 

^(rt-s)NMpTi 

i 5 

2 3 3 
NMp 

3 2 3 3 

( “r  + s)  N M p T] 

b 

2 3 3 
N M p 

3 2 3 3 

j (r  + s)N  M p T) 

7 

2 4 4 
NMp 

2 4 4 

(2r  + 3s)  N M p T) 

8 

2 4 4 
N M p 

j(5r  + 3s)  N2mVt] 

Table  1:  Maximum  moduli  of  and  errors  in 

complex  numbers  produced  in  Algorithm. 


The  results  at  step  8 must  have  a maximum  error  of  less  than  j in  order 
that  the  results  may  be  correctly  interpreted  as  integers.  It  follows  that 

2 4 4 

the  approximation  error  1]  in  the  roots  of  unity  must  satisfy  T|<1/ (5r+3s)N  M p , 

A 

(2)  When  the  final  maximum  modulus  L is  known,  we  can  derive  the  bound 

t a 11  a t-i  A 

et  ^ Cq  + 2 L^T) , which  is  based  on  the  inequality  2 . However, 

this  bound  does  not  improve  our  result  in  any  significant  way. 
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or,  equivalently,  the  number  of  bits  used  to  represent  the  powers  of  the 
roots  must  be  at  least 

q = '4  log2P  + log2  (5r  + 3s)l  + 2s  + 4r 

We  can  now  evaluate  the  performance  of  the  Fourier  Transform  algorithm. 

Let  P(q)  denote  the  number  of  bit  operations  required  to  multiply  two  a-bit 

numbers.  We  represent  the  real  and  imaginary  parts  of  both  the  powers  of  Q 

and  the  terms  A^,B  previously  defined  using  q bits  for  each,  and  note  that  a 

complex  field  multiplication  corresponds  to  four  multiplications  of  real  numbers. 

Note  that  the  integer  parts  of  the  results  grow  at  each  step  of  the  algorithm, 

but  concurrently  their  fractional  parts  lose  accuracy  so  that  the  significant 

operand  length  remains  constant.  It  follows  that  steps  1 and  8 require 

0(NM  log2  M • P(q))  bit  operations,  that  steps  3 and  6 require  0(NMlog2N-P(q)) 

bit  operations,  that  step  4 requires  0 (NMlog2 (NM) ‘P(o))  bit  operations,  and 

that  steps  2 and  7 require  0(NM*P(q))  bit  operations.  Obviously,  the  complexity 

of  Step  4 dominates  those  of  steps  1-8. 

In  Step  9,  MN  integers,  represented  by  q bits  each,  are  divided  by  p, 

which  is  represented  with  flog2pl<  q bits.  Thus,  the  complexity  of  this 

step  is  dominated  by  that  of  steps  2 or  6 and  hence  by  that  of  step  4.  Finally, 

in  Step  10,  N polynomials  over  GF(p)  of  degree  4m-4  are  divided  by  a polynomial 

of  degree  m.  Let  Dp(m)  denote  the  number  of  bit  operations  required  to  divide 

a polynomial  of  degree  2m  by  a polynomial  of  degree  m in  GF (p) . Then,  Step  10 

requires  at  most  0(N*D  (2m))  bit  operations.  If  m«  N,  i.e.,  m is  0(log  N)  or 

P 

less,  then  a long  division  method  can  be  used  at  step  10  and  the  complexity 

2 

of  Step  10  is  0(N  m P(log  p))  which  is  dominated  by  the  complexity  of  Steps 
3 and  6.  On  the  other  hand,  if  m is  large  compared  to  N,  then  fast  polynomial 
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division  methods  ["4.,  f 5"  can  be  used  at  Step  10.  The  complexity  is  then 

0(NM  log  m P(log  m+  log  p))  bit  operations  and  this  is  dominated  by  the  complexity 

of  Steps  1 and  8.  We  thus  have 

Theorem:  The  Fourier  Transform  of  a sequence  of  n elements  of  GF(pm), 

m 

nj  (p  -1),  can  be  computed  with  0 (nm* log(nm) •P(q))  bit  operations  where  P(a) 
is  the  number  of  bit  operations  required  to  multiply  two  q-bit  numbers  and 
q 2:  21og2  n + 41og2  m + 41og2p. 

As  a general  observation,  note  that  the  number  of  arithmetic  operations 
depends  on  the  degree  of  the  extension  of  GF(p)  but  not  on  the  characteristic 
p itself.  The  latter  affects  the  bit  complexity  of  arithmetic  operations 
only.  The  following  special  cases  of  the  theorem  are  also  of  interest. 

(i)  If  m = 1,  the  bit  complexity  reduces  to  0 (n  log  n P(q))  where 

q — 2 log  n + 4 log  p < 6 log  p,  i.e. , P(q)  is  proportional  to  the 
complexity  of  arithmetic  in  GF(p).  Thus,  the  Fourier  Transform 
over  GF(p)  can  be  computed  using  0(n  log  n)  arithmetic  operations 
whose  bit  complexity  is  essentially  that  of  arithmetic  operations 
in  GF (p) . 

(ii)  If  p = 2 and  n = 2m- 1 , (which  is  a case  of  great  interest  in  coding 

theory  [l],[2])  then  P(q)  is  proportional  to  the  complexity  of 
m 

arithmetic  in  GF(2  ).  Thus,  the  Fourier  Transform  of  a sequence  of 
length  2 -1  over  GF(2m)  can  be  computed  using  0(n  log^n)  arithmetic 
operations  whose  bit  complexity  is  essentially  that  of  arithmetic 
operations  in  GF(2m). 

Remark:  The  algorithm  proposed  in  this  paper  can  also  be  used  (with 

appropriate  modifications)  to  compute  the  convolution  of  sequences  over  GF(pm) 
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